A Machine Learning Approach to Convert CCGbank to Penn Treebank
نویسندگان
چکیده
Conversion between different grammar frameworks is of great importance to comparative performance analysis of the parsers developed based on them and to discover the essential nature of languages. This paper presents an approach that converts Combinatory Categorial Grammar (CCG) derivations to Penn Treebank (PTB) trees using a maximum entropy model. Compared with previous work, the presented technique makes the conversion practical by eliminating the need to develop mapping rules manually and achieves state-of-the-art results.
منابع مشابه
CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank
This article presents an algorithm for translating the Penn Treebank into a corpus of Combinatory Categorial Grammar (CCG) derivations augmented with local and long-range word–word dependencies. The resulting corpus,CCGbank,includes 99.4% of the sentences in the Penn Treebank. It is available from the Linguistic Data Consortium,and has been used to train widecoverage statistical parsers that ob...
متن کاملImproving the complement/adjunct distinction in CCGbank
One of the challenges of adapting the Penn Treebank for a specific formalism is that the target annotation often requires information represented imperfectly or not at all in the original corpus. When this occurs, the information must either be guessed with heuristics, or annotated manually. Recently, a third option has become available, due to the release of resources that supplement the Penn ...
متن کاملChinese CCGbank: extracting CCG derivations from the Penn Chinese Treebank
Automated conversion has allowed the development of wide-coverage corpora for a variety of grammar formalisms without the expense of manual annotation. Analysing new languages also tests formalisms, exposing their strengths and weaknesses. We present Chinese CCGbank, a 760,000 word corpus annotated with Combinatory Categorial Grammar (CCG) derivations, induced automatically from the Penn Chines...
متن کاملExtending CCGbank with Quotes and Multi-modal CCG
CCGbank is an automatic conversion of the Penn Treebank to Combinatory Categorial Grammar (CCG). We present two extensions to CCGbank which involve manipulating its derivation and category structure. We discuss approaches for the automatic re-insertion of removed quote symbols and evaluate their impact on the performance of the C&C CCG parser. We also analyse CCGbank to extract a multi-modal CC...
متن کاملExploration of the LTAG-Spinal Formalism and Treebank for Semantic Role Labeling
LTAG-spinal is a novel variant of traditional Lexicalized Tree Adjoining Grammar (LTAG) introduced by (Shen, 2006). The LTAG-spinal Treebank (Shen et al., 2008) combines elementary trees extracted from the Penn Treebank with Propbank annotation. In this paper, we present a semantic role labeling (SRL) system based on this new resource and provide an experimental comparison with CCGBank and a st...
متن کامل